18 research outputs found

    A Coreset-based, Tempered Variational Posterior for Accurate and Scalable Stochastic Gaussian Process Inference

    Full text link
    We present a novel stochastic variational Gaussian process (GP\mathcal{GP}) inference method, based on a posterior over a learnable set of weighted pseudo input-output points (coresets). Instead of a free-form variational family, the proposed coreset-based, variational tempered family for GP\mathcal{GP}s (CVTGP) is defined in terms of the GP\mathcal{GP} prior and the data-likelihood; hence, accommodating the modeling inductive biases. We derive CVTGP's lower bound for the log-marginal likelihood via marginalization of the proposed posterior over latent GP\mathcal{GP} coreset variables, and show it is amenable to stochastic optimization. CVTGP reduces the learnable parameter size to O(M)\mathcal{O}(M), enjoys numerical stability, and maintains O(M3)\mathcal{O}(M^3) time- and O(M2)\mathcal{O}(M^2) space-complexity, by leveraging a coreset-based tempered posterior that, in turn, provides sparse and explainable representations of the data. Results on simulated and real-world regression problems with Gaussian observation noise validate that CVTGP provides better evidence lower-bound estimates and predictive root mean squared error than alternative stochastic GP\mathcal{GP} inference methods

    Characterizing physiological and symptomatic variation in menstrual cycles using self-tracked mobile health data

    Full text link
    The menstrual cycle is a key indicator of overall health for women of reproductive age. Previously, menstruation was primarily studied through survey results; however, as menstrual tracking mobile apps become more widely adopted, they provide an increasingly large, content-rich source of menstrual health experiences and behaviors over time. By exploring a database of user-tracked observations from the Clue app by BioWink of over 378,000 users and 4.9 million natural cycles, we show that self-reported menstrual tracker data can reveal statistically significant relationships between per-person cycle length variability and self-reported qualitative symptoms. A concern for self-tracked data is that they reflect not only physiological behaviors, but also the engagement dynamics of app users. To mitigate such potential artifacts, we develop a procedure to exclude cycles lacking user engagement, thereby allowing us to better distinguish true menstrual patterns from tracking anomalies. We uncover that women located at different ends of the menstrual variability spectrum, based on the consistency of their cycle length statistics, exhibit statistically significant differences in their cycle characteristics and symptom tracking patterns. We also find that cycle and period length statistics are stationary over the app usage timeline across the variability spectrum. The symptoms that we identify as showing statistically significant association with timing data can be useful to clinicians and users for predicting cycle variability from symptoms or as potential health indicators for conditions like endometriosis. Our findings showcase the potential of longitudinal, high-resolution self-tracked data to improve understanding of menstruation and women's health as a whole.Comment: The Supplementary Information for this work, as well as the code required for data pre-processing and producing results is available in https://github.com/iurteaga/menstrual_cycle_analysi

    Learning endometriosis phenotypes from patient-generated data

    No full text
    Abstract Endometriosis is a systemic and chronic condition in women of childbearing age, yet a highly enigmatic disease with unresolved questions: there are no known biomarkers, nor established clinical stages. We here investigate the use of patient-generated health data and data-driven phenotyping to characterize endometriosis patient subtypes, based on their reported signs and symptoms. We aim at unsupervised learning of endometriosis phenotypes using self-tracking data from personal smartphones. We leverage data from an observational research study of over 4000 women with endometriosis that track their condition over more than 2 years. We extend a classical mixed-membership model to accommodate the idiosyncrasies of the data at hand, i.e., the multimodality and uncertainty of the self-tracked variables. The proposed method, by jointly modeling a wide range of observations (i.e., participant symptoms, quality of life, treatments), identifies clinically relevant endometriosis subtypes. Experiments show that our method is robust to different hyperparameter choices and the biases of self-tracking data (e.g., the wide variations in tracking frequency among participants). With this work, we show the promise of unsupervised learning of endometriosis subtypes from self-tracked data, as learned phenotypes align well with what is already known about the disease, but also suggest new clinically actionable findings. More generally, we argue that a continued research effort on unsupervised phenotyping methods with patient-generated health data via new mobile and digital technologies will have significant impact on the study of enigmatic diseases in particular, and health in general

    Sequential Monte Carlo for inference of latent ARMA time-series with innovations correlated in time

    No full text
    Abstract We consider the problem of sequential inference of latent time-series with innovations correlated in time and observed via nonlinear functions. We accommodate time-varying phenomena with diverse properties by means of a flexible mathematical representation of the data. We characterize statistically such time-series by a Bayesian analysis of their densities. The density that describes the transition of the state from time t to the next time instant t+1 is used for implementation of novel sequential Monte Carlo (SMC) methods. We present a set of SMC methods for inference of latent ARMA time-series with innovations correlated in time for different assumptions in knowledge of parameters. The methods operate in a unified and consistent manner for data with diverse memory properties. We show the validity of the proposed approach by comprehensive simulations of the challenging stochastic volatility model
    corecore